System and method for generating a phrase pronunciation

ABSTRACT

A system and method for a speech recognition technology that allows language models to be customized through the addition of special pronunciations for components of phrases, which are added to the factory language models during customization. It allows components of a phrase to have different pronunciations inside customer-added phrases than are specified for those isolated components in the factory language models.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.11/069,203, filed Feb. 28, 2005, and claims priority from co-pendingU.S. Provisional Patent Application Ser. No. 60/547,801, entitled“SYSTEM AND METHOD FOR GENERATING A PHRASE PRONUNCIATION,” filed Feb.27, 2004, which co-pending application is hereby incorporated byreference in its entirety.

This application relates to co-pending U.S. patent application Ser. No.10/413,405, entitled, “INFORMATION CODING SYSTEM AND METHOD”, filed Apr.15, 2003; co-pending U.S. patent application Ser. No. 10/447,290,entitled, “SYSTEM AND METHOD FOR UTILIZING NATURAL LANGUAGE PATIENTRECORDS”, filed on May 29, 2003; co-pending U.S. patent application Ser.No. 10/448,317, entitled, “METHOD, SYSTEM, AND APPARATUS FORVALIDATION”, filed on May 30, 2003; co-pending U.S. patent applicationSer. No. 10/448,325, entitled, “METHOD, SYSTEM, AND APPARATUS FORVIEWING DATA”, filed on May 30, 2003; co-pending U.S. patent applicationSer. No. 10/448,320, entitled, “METHOD, SYSTEM, AND APPARATUS FOR DATAREUSE”, filed on May 30, 2003; co-pending U.S. patent Application Ser.No. 10/953,448, entitled, “SYSTEM AND METHOD FOR DATA DOCUMENT SECTIONSEGMENTATIONS”, filed on Sep. 30, 2004; co-pending U.S. patentapplication Ser. No. 10/953,474, entitled, “SYSTEM AND METHOD FOR POSTPROCESSING SPEECH RECOGNITION OUTPUT,” filed on Sep. 29, 2004;co-pending U.S. patent application Ser. No. 10/953,471, entitled,“SYSTEM AND METHOD FOR MODIFYING A LANGUAGE MODEL AND POST-PROCESSORINFORMATION”, filed on Sep. 29, 2004; co-pending U.S. patent applicationSer. No. 10/951,291, entitled, “SYSTEM AND METHOD FOR CUSTOMIZING SPEECHRECOGNITION INPUT AND OUTPUT”, filed on Sep. 27, 2004; co-pending U.S.patent application Ser. No. 11/007,626, entitled “SYSTEM AND METHOD FORACCENTED MODIFICATION OF A LANGUAGE MODEL” filed on Dec. 8, 2004;co-pending U.S. patent application Ser. No. 10/787,889, entitled,“SYSTEM METHOD, AND APPARATUS FOR PREDICTION USING MINIMAL AFFIXPATTERNS”, filed on Feb. 27, 2004; and co-pending U.S. ProvisionalPatent Application 60/547,797, entitled, “SYSTEM AND METHOD FORNORMALIZATION OF A STRING OF WORDS”, filed on Feb. 27, 2004, all ofwhich co-pending applications are hereby incorporated by reference intheir entirety.

BACKGROUND OF THE INVENTION

The present invention relates generally to a system and method forproducing an optimal language model for performing speech recognition.

Today's speech recognition technology enables a computer to transcribespoken words into computer recognized text equivalents. Speechrecognition is the process of converting an acoustic signal, captured bya transducive element, such as a microphone or a telephone, to a set oftext words in a document. This process can be used for numerousapplications including transcription, data entry and word processing Thedevelopment of speech recognition technology is primarily focused onaccurate speech recognition, which is a formidable task due to the widevariety of pronunciations, phrases, accents, and speech characteristics.In particular, previous attempts to transcribe phrases accurately havebeen met with limited success.

The key to speech recognition technology is the language model. Today'sstate of the art speech recognition tools utilize a factory (orout-of-the-box) language model, which is often customized to produce asite-specific language model. Further, site-specific users of speechrecognition systems customize factory language models by includingsite-specific names and phrases. A site-specific language model mightinclude, for example, the names of doctors, hospitals, or medicaldepartments of a specific site using speech recognition technology.Unfortunately, factory language models include few names and phrases andprevious attempts to provide phrase customization did not producecustomized language models that accurately recognize phrases duringspeech recognition.

Previous efforts to solve this problem involved customizing a languagemodel by adding phrases and corresponding phrase pronunciations to thelanguage model. The phrase pronunciations for the added phrase werecreated as a combination of pronunciations of the components or elementsof the phrase As such, a phrase to be added to the language model wouldbe initially broken down into components. For each component, thelanguage model would be searched for a matching component andcorresponding pronunciation. If all components were found in thelanguage model, the corresponding pronunciations for each component ofthe phrase would be concatenated to form pronunciations of the newmulti-word phrase. The new phrase was then added, together with itscorresponding pronunciations, to the language model.

If any components were not found in the language model, a backgrounddictionary was searched for the components. Any component tokens stillnot found in either the language model or the background dictionary weresent to a pronunciation guesser module, where component pronunciationswere guessed based on their orthography (spelling). Phrasepronunciations were then formed for that phrase by combining allpronunciations from the language model, background dictionary, orguesser module. The new phrase was then added, together with itscorresponding pronunciations, to the language model.

However, problems occur when phrase components are pronounceddifferently when part of a phrase. For example, the ampersand sign ispronounced as ‘and’ in a phrase but as ‘ampersand’ in the languagemodel. Some previous systems attempted to solve this problem by addingadditional pronunciations to problematic words instead of adding phrasepronunciations. Unfortunately, if “&” in the language model is given anadditional pronunciation of ‘and’, then when an ordinary phrase such as“bacon and eggs” is dictated, it may be transcribed with an ampersandinstead of an “and”. Conversely, if “&” is not given an additionalpronunciation of ‘and’, then when the phrase “Brigham & Women'sHospital” is added to the language model, it would receive thepronunciation ‘Brigham ampersand women's hospital’ in the languagemodel. This is a problem because ‘Brigham & Women's Hospital’ isactually pronounced as ‘Brigham and women's hospital.’

Additional problems occur when elements of a dictated phrase are notpronounced, that is, are silent. Previous systems failed to providetranscription for any silent or unspoken aspect of a phrase. Forinstance, a slash is used in many phrases but silent when pronounced.For example, “OB/GYN” is a phrase pronounced ‘OBGYN’. However, undertraditional systems, the slash would not be recognized or transcribedunless the dictator actually spoke ‘slash’, despite the fact thatdoctors and hospitals expect the transcribed text of a medical report toinclude the slash in “OB/GYN”.

Another problem with silent elements of a phrase includes well-knownformatting or terms of the trade that are shortened or abbreviated forconvenience when spoken. For example, the phrase “WISC (Revised)” is aphrase that is dictated for convenience in the medical fields as ‘WISCRevised’, without specifically dictating the parentheses around‘Revised’. Traditional systems would require that the phrase in thelanguage model have a pronunciation including the parentheses. Thisapproach requires that the parentheses be awkwardly dictated in orderfor the automatic transcription to include the parentheses.

Additionally, traditional systems resulted in prohibitively largenumbers of permutations of possible phrase pronunciations for manyphrases. This is the result of each phrase component having multiplepronunciations in the language model. When combining the pronunciationsfrom each phrase component, the number of possible combinations growsrapidly. Therefore, previous systems added a huge number of possiblepronunciations for a long phrase where one or maybe two pronunciationswould be sufficient for automatic recognition of a long phrase.

Previous systems also failed to identify context based pronunciations ina phrase. For example, the phrases “St. Mulbery” and “Mulbery St.”contain the component ‘St.’ but the first phrase refers to a saint andthe second phrase refers to a street. A typical language model includesboth ‘street’ and ‘saint’ pronunciations for the component ‘St.’.Therefore, in previous systems when the phrase “St. Mulbery” was addedto the language model, the system would inefficiently provide both the‘saint Mulbery’ and ‘street Mulbery’ pronunciations.

Therefore, there exists a need for a speech recognition technology thatupdates a language model with phrases that can be accurately recognizedand transcribed.

SUMMARY OF THE INVENTION

The present invention includes a system and method for a speechrecognition technology that allows language models to be customizedthrough the addition phrase pronunciations through the use of specialpronunciations for components of phrases. The steps of the method mayinclude generating a list of pron components, whose pronunciationsdiffer when they occur in a phrase and assigning at least one pron toeach pron component. The steps may also include determining thepronunciation of a phrase, by tokenizing the phrase by generating a listof tokens corresponding to the phrase. Determining the phrasepronunciation may include determining a pron for each of the list oftokens and assembling the pronunciation of the phrase based upon acombination each pron. Finally, the system may add the phrase and thepronunciation of the phrase to the language model.

Another aspect of the present invention may include identifying initialand non-initial tokens of a phrase. The present invention may includegenerating a phonetic transcription for each pron component based on aliteral phonetic transcription or referencing a phonetic transcriptionfrom the language model.

Another aspect of the present invention may include determining a pronfor each token by searching a pron component list. The pron componentlist may include both an initial pron component list and a non-initialpron component list.

Another aspect of the present invention may include searching thelanguage model and/or the background dictionary for a pron. The presentinvention may also include a pron guesser for guessing the pron for atoken.

In another aspect, the present invention includes a system for addingphrase pronunciations to a language model including a computer with acomputer code mechanism for processing a list of pron components whosepronunciations differ when they occur in a phrase, assigning at leastone pron to each pron component, determining the pronunciation of afirst phrase by first tokenizing the first phrase by generating a listof tokens corresponding to the first phrase, then determining a pron foreach of the list of tokens, then assembling the pronunciation of thefirst phrase based on a combination of each pron, and adding the firstphrase and the pronunciation of the first phrase to a language model; alanguage model electronically accessible by the computer code mechanism;and a tokenizer for generating a list of tokens corresponding to thefirst phrase, the tokenizer being in electronic communication with thecomputer code mechanism. In some embodiments the pron components listincludes non-initial components. In some embodiments, the proncomponents list includes initial components.

In still another embodiment the system includes a background dictionaryelectronically accessible by the computer code mechanism, wherein thecomputer code mechanism searches the background dictionary to determinea pron for each token.

In another embodiment the system includes a pron guesser in electroniccommunication with the computer code mechanism, wherein the computercode mechanism applies the pron guesser to determine a pron for eachtoken.

BRIEF DESCRIPTION OF THE DRAWINGS

While the specification concludes with claims particularly pointing outand distinctly claiming the present invention, it is believed the samewill be better understood from the following description taken inconjunction with the accompanying drawings, which illustrate, in anon-limiting fashion, the best mode presently contemplated for carryingout the present invention, and in which like reference numeralsdesignate like parts throughout the Figures, wherein:

FIG. 1 shows an architecture view of the system and method for modifyinga language model in accordance with prior art; and

FIG. 2 shows an architecture view of the system and method for modifyinga language model in accordance with certain teachings of the presentdisclosure.

DETAILED DESCRIPTION

The present disclosure will now be described more fully with referenceto the Figures in which an embodiment of the present disclosure isshown. The subject matter of this disclosure may, however, be embodiedin many different forms and should not be construed as being limited tothe embodiments set forth herein.

Referring to FIG. 1, an architecture view shows a previously knownsystem or method for the creation of a multiword phrase pronunciationand for the modification of a language model in accordance with theprior art. The method begins with step 10 initializing the steps of thesystem.

A list of phrases to be added to the language model is fed into thesystem in step 15. Each phrase from the input list is presented to thesystem in step 20, and proceeds all the way thru to the end at step 85,at which point the pronunciations created for each phrase are added tothe language model. The system is repeated for each phrase in the inputlist until we have added pronunciations for all the phrases to thelanguage model.

In step 20, a phrase is compared against the language model to determineif the phrase already exists in the language model. If so, thepronunciation or pronunciations associated with the phrase are collectedfrom the language model in step 25 and provided to step 75.

If the phrase is not located in the language model in step 20, thebackground dictionary is searched in step 30. If a match to the phraseis found in the background dictionary, the pronunciation orpronunciations associated with the phrase are collected from thebackground dictionary in step 35 and provided to step 75.

It should be noted that words in the language model may have multiplepronunciations associated with a given word or phrase. Likewise, wordsin the background dictionary may also have multiple pronunciationsassociated with a given word or phrase. Therefore, if a word or phraseis located in the language model or background dictionary, multiplepronunciations may be provided to step 75 or a given phrase or componentof a phrase.

If the phrase is not found in either the language model or thebackground dictionary, the phrase is broken into smaller parts orphrasal components if possible. Step 40 determines if the phrase can beparsed into a first component and a second component at the first spaceor punctuation mark. Step 45 determines if the phrase includes more thatone part and if so, step 50 begins a recursive loop on the first part orcomponent of the phrase.

Step 50 sends the first component back to step 20 to initiate the loopon the first component. Step 20 determines if the first component existsin the language model. If a matching component is found in the languagemodel, then the pronunciation of the first component is retrieved fromthe language model and delivered to step 75.

If a match is not found in the language model, then step 30 determinesif the first component is in the background dictionary in step 30. Ifthe first component is found in the background dictionary, then itspronunciation is retrieved from the background dictionary and deliveredto step 75.

If a match is not found in either the language model or the backgrounddictionary, then step 40 determines if the first component may be brokendown any further into smaller components. As the first component wasremoved from the phrase on the initial pass through the system, thefirst component cannot be broken into smaller parts and therefore step45 will determine that there is no more than one part of the firstcomponent.

When any phrasal component passing through the system cannot be brokeninto smaller parts and cannot be matched in either the language model orthe background dictionary, the pronunciation of the phrasal componentwill be guessed in step 60. It should be noted that the pronunciationguesser in step 60 may guess multiple pronunciations and that thosepronunciations will be passed forward to step 75.

Once pronunciations for the first component are delivered to step 75from the language Model, the background dictionary, or the pronunciationguesser, the recursive loop of step 50 is finished and the recursiveloop on the second part of the phrase in step 55 is sent to step 20.

The second part passed through steps 20, 25, 30, and 35 as describedabove. If a pronunciation is found for the second part, then thepronunciation or pronunciations are delivered to step 75. However, if nopronunciations are found, then the second part is analyzed in step 40 todetermine if the second part of the phrase contains smaller componentsthat can be individually passed through the system as the firstcomponent.

If the second part does not contain any smaller components and no matchfor the second part is found in either the language model or thebackground dictionary, then step 60 guesses the pronunciation of thesecond part. The guessed pronunciations are delivered to step 75. Step75 combines the pronunciations from each phrasal component. Step 80writes the phrase and the pronunciations to the language model and step85 ends the system.

If the second part does contain multiple parts, then step 45 willdetermine that there is more than one part and proceed to step 50 wherethe first component of the second part will be sent to step 20. Therecursive loops of steps 50 and 55 will repeat the above described stepswith respect to FIG. 1, specifically repeating the recursive loop steps50, 55 and 65 until each individual phrasal component is identified andcorresponding pronunciations assigned and delivered to step 75.

When all the components or parts have corresponding pronunciationsassigned and delivered to step 75, the pronunciations are combined. Thepronunciations from the top level call and all recursive calls arecombined in step 75 and added to the language model in step 80 to beused by subsequent passes through the system. Once the phrase andcorresponding pronunciations are written to the language model in step80, the system is ended in step 85.

It should be noted that when the pronunciations are combined in step 75,the number of phrase pronunciations could multiply very quickly if eachcomponent or part is associated with multiple correspondingpronunciations. Therefore, the number of permutations of possible phrasepronunciations to be written to the language model may be prohibitivelylarge for a long multi-part phrase with multiple pronunciations for eachpart of the phrase.

Referring to FIG. 2, an architecture view shows a system or method forthe creation of a multiword phrase pronunciation and for themodification of a language model in accordance with an embodiment of thepresent invention. The method begins with step 100 initializing thesteps.

As with the system shown in FIG. 1, an input list of phrases to be addedto the language model is provided to the system in step 101. It shouldbe noted that the phrases may be entered on an individual basis orentered as a group, sequentially passing through the system.

Each phrase from the input list is presented to the system in step 102,and proceeds through the system to the end at step 135, at which pointthe pronunciations for each phrase are written to the language model.The process is repeated for each phrase in the input list until we haveadded pronunciations to the language model for every phrase in the inputlist.

In step 102, a phrase is compared against the language model todetermine if the phrase already exists in the language model. If so, thepronunciation or pronunciations associated with the phrase are collectedfrom the language model in step 103 and provided to step 130.

If the phrase is not located in the language model in step 102, thebackground dictionary is searched in step 104. If a match to the phraseis found in the background dictionary, the pronunciation orpronunciations associated with the phrase are collected from thebackground dictionary in step 105 and provided to step 130.

If the phrase is not found in either the language model or thebackground dictionary, a tokenizer breaks up the phrase into phrasalcomponents or ‘tokens’ in step 110. These tokens are delivered to step120, where a loop begins that sequentially processes each token of thephrase.

It should be noted that the tokenizer parses a phrase according tocertain rules. Primarily, the tokenizer breaks up a phrase into phrasalelements or tokens at certain boundaries, looking for the longest matchin the language model or background dictionary. For instance, the phrase“ham & eggs” has 3 phrasal elements and the tokenizer would break thephrase up into three tokens: “ham,” “&,” and “eggs.” However, the phrase“San Francisco Chronicle” contains two phrasal elements: “San Francisco”and “Chronicle.” The element “San Francisco” is one element because amatch exists in the language model for “San Francisco.” The tokenizermay also parse a phrase simply by white space or punctuation.

Step 120 controls the system looping the tokens from the tokenizer. Eachtoken is provided in turn to step 125. Step 125 determines if additionaltokens have not passed through the system. If a token has not passedthrough the system, the token is delivered to step 140. If every tokenhas passed through the system, step 125 would direct the system to step130.

For each token, a pron component list is searched for a match. The proncomponent list includes pron components or tokens that are pronounceddifferently when part of a phrase. The pron component list includesthese tokens and corresponding pronunciations. The correspondingpronunciations in the pron component list, language model, andbackground dictionary are referred to as prons. The prons located in thepron component list are the pronunciations of how tokens are pronouncedin a phrase For example, the token “&” would have a pron of ‘and’ in thepron component list but a pronunciation of ‘ampersand’ in the languagemodel.

The pron component list may also include components that are notpronounced differently but require fewer pronunciations to be recognizedby a speech recognition system when part of a phrase. For example, “and”only needs one, maybe two, pronunciations to be recognized as part of aphrase as opposed to the many more pronunciations that are typicallyfound in a language model for the token ‘and’. Therefore, the token‘and’ may be present in the pron component list with only one pron of‘and’. The pron component list may also include punctuations orformatting that is present in the text of the phrase but is silent inthe spoken phrase. In this situation, if the phrase ‘OB/GYN’ was aphrase to be added to the language model, the token ‘/’ would have asilent pron.

It should be noted that prons may be specified in the pron componentlist as literal phonetic transcriptions of their corresponding tokens,or prons may referenced their corresponding tokens in the languagemodel, where the phonetic transcription is looked up by referencing thattoken in the language model.

To provide additional recognition accuracy, an initial pron componentlist may be searched for a match to the first token of every phrase.This initial pron component list may be used to identify the uniquepronunciations of tokens when they occur at the start of a phrase.Therefore, the pron component list and the initial pron component listmay be substantially identical except for those tokens that havedifferent prons when they occur at the start of a phrase. For example,‘St.’ is a token that changes prons depending on whether the tokenoccurs at the start of the phrase. ‘St.’ has a pron of ‘saint’ when itoccurs at the start of a phrase and ‘street’ or ‘saint’ when it occurselsewhere in a phrase.

The embodiment of FIG. 2 utilizes an initial pron component list.However, the system shown in FIG. 2 might also be accomplished with onlya pron component list and remain within the scope of the invention.Therefore in FIG. 2, step 140 determines if the token passing throughthe loop is the first token in the phrase. If so, then the first tokenis delivered to step 150 where a list of initial pronunciationcomponents or ‘pron’ components may be searched to determine if thefirst token is in the initial pron component list. If a match to thefirst token is found, then the corresponding initial pron component isretrieved from the initial pron component list in step 155 and added toa global set of prons being collected for each of the tokens in thephrase in step 160.

If the first token of the phrase is not located in the initial proncomponent list, then the first token is delivered to step 181. Step 181determines if the first token is in the language model and if so,retrieves the pronunciations from the language model in step 182. Step183 adds the pronunciations to the global set or prons being collectedfor each of the tokens in the phrase.

If the first token is not located in the language model, then the firsttoken is delivered to step 185. Step 185 determines if the first tokenis in the background dictionary and if so, retrieves the pronunciationsfrom the background dictionary in step 190. Step 195 adds thepronunciations to the global set or prons being collected for each ofthe tokens in the phrase.

If a match is not found in the initial pron component list or thelanguage model or the background dictionary, then step 200 guesses thepronunciation for the first token. Step 205 adds the guessedpronunciation to the global set of prons being collected for each of thetokens in the phrase.

Once the first token is assigned a pronunciation by the system, steps165 and 120 return the system to step 125 where the second tokenproceeds through the system. Step 140 determines that second tokenshould proceed to step 170, which determines if the second token ispresent in the pron component list If a match of the second token isfound in the pron component list, then the corresponding pron isretrieved from the pron component list in step 175 and added to theglobal set of prons being collected for each of the tokens in the phrasein step 180.

If the second token is not located in the pron component list, then thesecond token is delivered to step 181. Step 181 determines if the secondtoken is in the language model and if so, retrieves the pronunciationfrom the language model in step 182. Step 183 adds the pronunciation tothe global set of prons being collected for each of the tokens in thephrase.

If the second token is not located in the language model, then the tokenis delivered to step 185. Step 185 determines if the token is in thebackground dictionary and if so, retrieves the pronunciation from thebackground dictionary in step 190. Step 195 adds the pronunciation tothe global set of prons being collected for each of the tokens in thephrase

If a match is not found in the pron component list or the language modelor the background dictionary, then step 200 guesses the pronunciationfor the second token. Step 205 adds the guessed pronunciation to theglobal set of prons being collected for each of the tokens in the phrase

Once the second token is assigned a pronunciation by the system, steps165 and 120 return the loop to step 125. Step 125 determines whetherthere are additional tokens in the phrase that have not passed throughthe system shown in FIG. 2. It should be noted that each additionaltoken of the phrase passes through the system in the same manner asdescribed above with respect to the second token. It should also benoted that the system may perform as many loops as necessary to processevery token in the phrase and compile a pronunciation for every token inthe phrase. For example, a phrase with four tokens will make four loopsthrough the system and a phrase with ten tokens will make ten loopsthrough the system.

It should be noted that as prons are added to the global set of pronsbeing collected for each token of the phrase, the pronunciation for thephrase is combined token by token. Once every token is assigned acorresponding pronunciation, a pronunciation for the entire phrase iscreated from the combined pronunciations, and there are no additionaltokens to be processed, step 125 will indicate that the system isfinished and deliver the phrase and corresponding phrase pronunciationsto Step 130. Step 130 will then write the phrase and the correspondingphrase pronunciation to the language model for use during automaticspeech recognition. After the language model is updated, the system endswith step 135.

It should be noted that after a phrase and corresponding phrasepronunciations are written to the language model, the next phrase fromthe input list is processed from step 102 to step 135. Multiple phrasesmay be processed and automatically assigned pronunciations until eachphrase in the input list is assigned pronunciations and written in thelanguage model. Thus, phrases may be individually added to the languagemodel as described above with reference to FIG. 2 or multiple phrasesmay be added to the language model at one time by repeating the step 102through step 135 for each phrase in the input list.

A computer system for implementing the methods described above will nowbe described. Such a computer system has a computer with a computer codemechanism capable of processing a list of pron components whosepronunciations differ when they occur in a phrase. The computer codemechanism assigns at least one pron to each pron component. The computercode mechanism then determines the pronunciation of a phrase byproviding the phrase to a tokenizer in electronic communication with thecomputer code mechanism. The computer code mechanism then determines apron for each of the list of tokens provided by the tokenizer andassembles the pronunciation of the phrase based on a combination of eachof the prons. The computer code mechanism then adds the phrase and thepronunciation of the phrase to a language model electronicallyaccessible by the computer code mechanism.

Optionally, the computer code mechanism may be capable of generate aphonetic transcription for each pron component when assigning a pron toeach pron component. In generating a phonetic transcription, thecomputer code mechanism optionally may reference an item in the languagemodel. The computer code mechanism optionally may specify a literalphonetic transcription when generating a phonetic transcription.

Optionally, the computer code mechanism may also be capable ofprocessing a pron component list containing initial or non-initialcomponents.

The computer system also includes a language model electronicallyaccessible by the computer code mechanism. After the computer codemechanism completes determining the pronunciation of a phrase, thecomputer code mechanism adds the phrase and its pronunciation to thelanguage model. Optionally, the language model may be capable of beingreferenced by the computer code mechanism when the computer codemechanism generates a phonetic transcription. The language modeloptionally may be capable of being searched by the computer codemechanism in order to determine a pron.

The computer system further includes a tokenizer. The tokenizer is inelectronic communication with the computer code mechanism and generatesa list of tokens corresponding to a phrase provided by the computer codemechanism. The tokenizer then provides the list of tokens to thecomputer code mechanism. The tokenizer may also identify an initial or anon-initial token.

Optionally, the computer system may include a background dictionaryelectronically accessible by the computer code mechanism. If such abackground dictionary is available, it may be searched by the computercode mechanism in order to determine a pron.

Optionally, the computer system may further include a pron guesser. Thepron guesser, if present, is in electronic communication with thecomputer code mechanism and is capable of being applied to a token inorder to determine a pron.

It will be apparent to one of skill in the art that described herein isa novel system and method for modifying a language model. While theinvention has been described with reference to specific embodiments, itis not limited to these embodiments. The invention may be modified orvaried in many ways and such modifications and variations as would beobvious to one of skill in the art are within the scope and spirit ofthe invention and are included within the scope of the following claims.

1.-17. (canceled)
 18. A method in a computer system comprising a language model, a background dictionary, and at least one preexisting pron component list, for adding phrase pronunciations to a language model, said method comprising the steps of: inputting at least one phrase to be added to the language model; determining if said at least one phrase is contained in said language model; if said at least one phrase is not contained in said language model, determining if said at least one phrase is contained in said background dictionary, and, if so, adding background dictionary pronunciation to said language model; if said at least one phrase is not contained in said language model or said background dictionary, parsing said at least one phrase into an ordered set of tokens in accordance with certain rules sequentially associating prons with each said token of said ordered set of tokens, generating a pron derived phrase pronunciation from said prons and adding said pron derived phrase pronunciation to said language model.
 19. A method, in accordance with claim 18, wherein the step of generating said pron derived phrase pronunciation, further comprises, for each said token of said ordered set of tokens: a) sequentially determining if each said associated pron is in said preexisting pron component list, and, if so, obtaining pronunciation from said preexisting pron component list; b) if said associated pron is not in said preexisting pron component list, determining if said associated pron is in a preexisting language model, and, if so, adding said language model pron to said preexisting pron component list; c) if said associated pron is not in preexisting pron component list or said preexisting language model, determining if said associated pron is in preexisting background dictionary, and, if so, adding said background dictionary pron to said preexisting pron component list; d) if said associated pron is not in said preexisting pron component list, said preexisting language model or said preexisting background dictionary, generating a guess pron, and adding said guess pron to said preexisting pron component list; e) if there is an additional token in said ordered set of tokens, repeating steps a) to d); and, f) if there are no additional tokens in said ordered set of tokens, generating said pron derived phrase pronunciation by combining said associated pron pronunciations as obtained from said preexisting pron component list, in sequence.
 20. A method for adding phrase pronunciations to a language model, in accordance with claim 18, wherein said pron component list includes punctuations or formatting that is present in the said at least one phrase but is silent in the pronunciation of said at least one phrase.
 21. A method for adding phrase pronunciations to a language model, in accordance with claim 19, wherein said pron component list selected from a plurality of lists in accordance with the position of the said token within the said at least one phrase.
 22. A method for adding phrase pronunciations to a language model, in accordance with claim 18, wherein said certain rules comprise breaking up the said phrase into tokens at certain boundaries.
 23. A method for adding phrase pronunciations to a language model, in accordance with claim 22, wherein said certain boundaries comprise white spaces and/or punctuation.
 24. A method for adding phrase pronunciations to a language model, in accordance with claim 18, wherein said certain rules comprise looking for the longest match in said preexisting language model or said preexisting background dictionary.
 25. A method for adding phrase pronunciations to a language model, in accordance with claim 19, wherein said preexisting pron component lists comprise an initial pron component list and a non-initial pron component list.
 26. A tangible computer usable medium having computer readable instructions stored thereon for execution by a processor and comprising a language model, a background dictionary, and at least one preexisting pron component list to perform a method comprising: inputting at least one phrase to be added to the language model; determining if said at least one phrase is contained in said language model; if said at least one phrase is not contained in said language model, determining if said at least one phrase is contained in said background dictionary, and, if so, adding background dictionary pronunciation to said language model; if said at least one phrase is not contained in said language model or said background dictionary, parsing said at least one phrase into an ordered set of tokens in accordance with certain rules, sequentially associating prons with each said token of said ordered set of tokens, generating a pron derived phrase pronunciation from said prons, and adding said pron derived phrase pronunciation to said language model.
 27. A tangible computer usable medium, in accordance with claim 26, to perform a method wherein the step of generating said pron derived phrase pronunciation further comprises, for each said token of said ordered set of tokens: a) sequentially determining if each said associated pron is in said preexisting pron component list, and, if so, obtaining pronunciation from said preexisting pron component list; b) if said associated pron is not in said preexisting pron component list, determining if said associated pron is in a preexisting language model, and, if so, adding said language model pron to said preexisting pron component list; c) if said associated pron is not in preexisting pron component list or said preexisting language model, determining if said associated pron is in preexisting background dictionary, and, if so, adding said background dictionary pron to said preexisting pron component list; d) if said associated pron is not in said preexisting pron component list, said preexisting language model or said preexisting background dictionary, generating a guess pron, and adding said guess pron to said preexisting pron component list; e) if there is an additional token in said ordered set of tokens, repeating steps a) to d); and, f) if there are no additional tokens in said ordered set of tokens, generating said pron derived phrase pronunciation by combining said associated pron pronunciations as obtained from said preexisting pron component list, in sequence.
 28. A computer usable medium, in accordance with claim 27, wherein said pron component list includes punctuations or formatting that is present in the text but is silent in the pronunciation of said at least one phrase.
 29. A computer usable medium, in accordance with claim 27, wherein said pron component list selected from a plurality of lists in accordance with the position of the said token within the said at least one phrase.
 30. A computer usable medium, in accordance with claim 26, wherein said certain rules comprise breaking up the said phrase into tokens at certain boundaries.
 31. A computer usable medium, in accordance with claim 30, wherein said certain boundaries comprise white spaces and/or punctuation.
 32. A computer usable medium, in accordance with claim 26, wherein said certain rules comprise looking for the longest match in said preexisting language model or said preexisting background dictionary.
 33. A computer usable medium, in accordance with claim 27, wherein said preexisting pron component lists comprise an initial pron component list and a non-initial pron component list. 